Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix composable templates with subobjects: false #97317

Merged
merged 48 commits into from
Sep 13, 2023

Conversation

eyalkoren
Copy link
Contributor

@eyalkoren eyalkoren commented Jul 3, 2023

Fixes #96768

A fix proposal based on the identified root cause.

The problem in general is that mapping properties parsing is affected by the subobjects setting, so if multiple mappings are merged, whenever explicit subobjects setting is encountered, formerly parsed mappings become invalid.
What I tried in this PR is to determine whether there is an explicit subobjects setting within a list of mappings before parsing (and merging) them. This allows for consistent parsing and merging of component templates when processing and validating composable component templates.

If this fix proposal is evaluated as a valid path, we still need to add some unit tests, for example:

  • different order of components in the composedOf list
  • contradicting subobjects settings
  • comparing the new bulk merge to the existing sequential merge

@eyalkoren eyalkoren added >bug :Search Foundations/Mapping Index mappings, including merging and defining field types labels Jul 3, 2023
@elasticsearchmachine elasticsearchmachine added v8.10.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jul 3, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @eyalkoren, I've created a changelog YAML for you.

@eyalkoren eyalkoren marked this pull request as ready for review July 3, 2023 12:10
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jul 3, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Member

@piergm piergm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the chosen path of determining whether there is an explicit sub-objects setting before merging is the right one! Therefore the solution proposed LGTM. Thanks @eyalkoren for the PR.

Copy link
Member

@piergm piergm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the PR @eyalkoren 👍

@piergm piergm requested review from javanna and romseygeek August 2, 2023 09:01
@romseygeek
Copy link
Contributor

Does this fix the case when we have conflicting subobjects directives on a submapper (ie not directly on the root)?

I worry a bit that this is a special-case fix for one specific type of possible conflict, and that maybe we need to take a step back and see if there is a better way of doing this more generally. In particular, it might be worth trying to merge the templates as json maps rather than as parsed mappings? As I understand it, v2 templates are supposed to be parseable so that they can be verified when PUT, but I don't think that necessarily means that we need to merge them together as mappings at index creation time.

I think it might also be worth getting some input from @elastic/es-data-management given that they own the template code.

@eyalkoren
Copy link
Contributor Author

@romseygeek thanks for your input! I agree that there is a bigger issue here, I'll try to define it in more details.

Firstly, just to clarify, this is not a fix for handling of conflicting directives, it is a fix for failure to merge valid (non-conflicting) mappings. I only added a validation for conflicting explicit subobjects at the root because it's better to fail fast on such case already during parsing, before even trying to merge.

Having said that, and maybe that's your point, it doesn't fix failure to merge non-root valid mappings, such as:

    "mappings" : {
      "properties" : {
        "parent" : {
          "subobjects" : false,
          "type" : "object",
        }
      }
    }

and

    "mappings" : {
      "properties" : {
        "parent" : {
          "properties" : {
            "child.grandchild" : {
              "type" : "keyword"
            }
          }
        }
      }
    }

Merging the two above will also fail.
However, when you think of how composable templates are used, I think this is much less likely than failures caused by a merge of mappings with subobjects: false at the root level. Normally, a component template makes sense on its own, so it is much more likely that in order to achieve the above mappings, a single mapping would include:

    "mappings" : {
      "properties" : {
        "parent" : {
          "subobjects" : false,
          "type" : "object",
          "properties" : {
            "child.grandchild" : {
              "type" : "keyword"
            }
          }
        }
      }
    }

So, while I agree and even willing to take the challenge to find a wider fix, wouldn't you think it's better to eliminate the higher-impact issue?

In particular, it might be worth trying to merge the templates as json maps rather than as parsed mappings?

I am not sure what this means, let's discuss this.
I think there is a fundamental problem of automatically parsing dots and creating objects based on them. It's like we need to educate Elasticsearch all over that dots are not necessarily indication for field hierarchy, but may serve as yet another valid character in field names.

@eyalkoren
Copy link
Contributor Author

Summarizing my offline discussion with @romseygeek: we may be able to generically solve this problem by merging mappings in their raw form first, before parsing them, as in XContentHelper#mergeDefaults, possibly with slight changes and only for merges during the composition of index templates.
One possible change we'd need to apply for the current merge for example- we may be OK with simply overriding the value for a field's "type" by another type coming from another mapping, but the current expectation is to not allow conflicts of field type with object type.

Another alternative is to postpone the object expansion based on . in field names to be done as late as possible, after merging is done, at which time we already have the full picture from all mappings. This may be much more complicated to do.

@dakrone your input on this would be very helpful.

@eyalkoren
Copy link
Contributor Author

Update (for my future self mostly): I added a test with the exact example described above that fails as expected.
Applying @romseygeek's proposal of merging the raw (non-parsed JSON Map<String, Object> representations) mappings through XContentHelper#mergeDefaults fixes the test. I already know it would fail other tests that expect merge exceptions when trying to merge field X with type: object with another mapping in which field X is mapped to a non-object type.
So this approach seems valid so far, but we'd need to implement our own raw merge that is very similar to mergeDefaults but contains all special cases for mappings merges for index templates.

@romseygeek
Copy link
Contributor

I already know it would fail other tests that expect merge exceptions when trying to merge field X with type: object with another mapping in which field X is mapped to a non-object type.

I think it's likely that we'll still get exceptions, it's just that they'll be different exceptions? For example, overriding an object mapper with a float mapper will then throw an exception at parse time, because float mappers can't have a properties subfield.

@eyalkoren
Copy link
Contributor Author

eyalkoren commented Aug 3, 2023

I think it's likely that we'll still get exceptions, it's just that they'll be different exceptions? For example, overriding an object mapper with a float mapper will then throw an exception at parse time, because float mappers can't have a properties subfield.

It will then depend in the order of the merge. XContentHelper#mergeDefaults simply merges without knowing anything about what it is merging, so it doesn't know that it merges mappings and that the type key may require special handling or validations.
When I ran MetadataIndexTemplateServiceTests, testIndexTemplateFailsToOverrideComponentTemplateMappingField() failed because it expected an exception during validation, which did not occur because the merged raw map contained type: object, which is OK with properties. On the other hand, testUpdateComponentTemplateFailsIfResolvedIndexTemplatesWouldBeInvalid() failed due to a different exception, where a text field is not expected to have properties subfield, as you say.

The point is that it is not so much about how tests will fail, but that we'll have to reproduce the same behavior in raw mapping merge as we get with parsed mappings merge.

@eyalkoren eyalkoren requested a review from dakrone August 24, 2023 13:34
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took an initial look and it looks pretty good. I left only a couple of comments, and tomorrow I plan to spend some time manually testing this. I definitely think someone on the mapping side should take a look, as I'm only an expert on the template side, not the mapping side.

@eyalkoren
Copy link
Contributor Author

@dakrone when adding the custom merge javadoc, I realized that we never really went carefully over all mapping parameters to identify which we want to preserve when we merge, like we plan to do with subobjects. For example- what about dynamic which takes affect on an entire subtree? Are there any other such that affect a field mapping subtree rather than only specific field's mapping?

Also - are there other mapping parameters that can be set at the root level (like subobjects)?
I know that ignore_malformed can be set at the root level, but since it is done through the settings node, it won't be a problem. Note that any parameter under _doc other than subobjects will be lost during merge with the proposed algorithm.

@eyalkoren
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/bwc

@eyalkoren eyalkoren requested a review from dakrone September 3, 2023 06:38
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the work on this @eyalkoren

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM, though I'm far from an expert in mapping stuff. The template changes seem harmless enough. Left just a couple small comments.

romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Sep 11, 2023
note that this should cause CI failures until
elastic#97317 is merged
@eyalkoren eyalkoren dismissed dakrone’s stale review September 13, 2023 08:35

Addressed with the new raw mapping merge

@chiarch84
Copy link

Dear all,
after reading this thread it is not clear to me what to do in order to have my mappings of subobjects merged and not overrid.
In version 8.9.2 the following component templates were merged correctly. Now from version 8.10 on, the third one overrides the second one even though the subobjects fields are not conflicting. Could you please help me to understand how I can make it work in the new versions? Thanks!

PUT _component_template/bdap_id
{
  "version": 1,
  "template": {
    "mappings": {
      "dynamic": false,
      "_source": {
        "enabled": true
      },
      "properties": {
        "id": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }
      }
    }
  },
  "_meta": {
    "description": "Composable template for storing text ids"
  }
}
 PUT _component_template/bdap_stac_itemproperties
{
  "version": 2,
  "template": {
    "mappings": {
      "dynamic": false,
      "_source": {
        "enabled": true
      },
      "properties": {
        "properties": {
          "properties": {
            "title": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword"
                }
              }
            },
            "description": {
              "type": "text"
            }
          }
        }
      }
    }
  },
  "_meta": {
    "description": "Composable template for storing STAC item properties"
  }
}
PUT _component_template/bdap_stac_itemproperties_sat
{
  "version": 2,
  "template": {
    "mappings": {
      "dynamic": false,
      "_source": {
        "enabled": true
      },
      "properties": {
        "properties": {
          "properties": {
            "sat:absolute_orbit": {
              "type": "integer"
            },
            "sat:relative_orbit": {
              "type": "integer"
            }
          }
        }
      }
    }
  },
  "_meta": {
    "description": "Composable template for storing STAC item properties for sat: extension"
  }
}

Here the final index_template:

PUT _index_template/bdap_template_items_stac
{
  "version": 1,
  "priority" : 5,
  "template": {
    "settings": {
      "index": {
        "number_of_shards": "1",
        "number_of_replicas": "1"
      }
    },
    "mappings": {
      "dynamic": "strict",
      "_source": {
        "enabled": true,
        "includes": [],
        "excludes": []
      },
      "_routing": {
        "required": false
      },
      "dynamic_templates": []
    },
    "aliases": {
      "bdap": {}
    }
  },
  "index_patterns": [
    "bdap-items-stac*",
    "bdap-test-items-stac*",
    "bdap-dev-items-stac*"
  ],
  "composed_of": [
    "bdap_id",
    "bdap_stac_itemproperties",
    "bdap_stac_itemproperties_sat"
  ],
  "_meta": {
    "description": "Composable template for storing STAC items used for PROD, TEST and DEV indexes"
  }
}

If I try to do the PUT of 1 first document in version 8.10 I get an error, while in 8.9.2 it created correctly the new index by merging the mappings.

PUT bdap-dev-items-stac-001/_doc/1
{
    "id": "Landcover.ESA.WorldCover2020.V1.item.VRT_ESA_WorldCover_10m_2020_v100_InputQuality",
    "properties": {
      "title": "ESA_WorldCover_10m_2020_v100_InputQuality.vrt",
      "description": "xcgsdfgsdfgsd",
      "sat:absolute_orbit": 45938,
      "sat:relative_orbit": 38,
    }
}

@eyalkoren
Copy link
Contributor Author

@chiarch84 thanks for reporting!
At first glance, I think this is related to the specific case of merging mappings of a field named properties. I could have sworn I took this option into consideration, but I can't find any evidence for it.
Please verify that this is indeed the case (for example, try only renaming your properties field and see if this resolves the issue) and if so, please open a separate issue with the same description as above and with the error you get.

@eyalkoren
Copy link
Contributor Author

I finally got to actually look and there is indeed a test case I added for it, but maybe yours is a specific scenario that is not properly covered.
Once I get to it, I'll test myself, but if you get the chance to verify as I asked, it would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Composable template fails with subobjects: false and mappings in different component templates
9 participants